Dynamic Patterns of Public Transport Usage

Author

Wan Kee

Published

November 25, 2023

A Geospatial Analysis of Bus Passenger Volume in Urban Environments

1.1 Overview

In the intricate mosaic of urban transportation, bus stops serve as pivotal nodes that capture the pulse of city life through the ebb and flow of passenger trips. The study of passenger trip volume, particularly during peak hours, becomes essential for enhancing service efficiency, planning urban infrastructure, and improving the overall commuter experience.

This geospatial analysis is anchored in the bustling landscape of a metropolitan area, where data on bus stops, population distribution, urban development plans, and passenger volume intertwine to paint a comprehensive picture of transit dynamics.

This analysis aims to dissect the rhythms of urban mobility in Singapore through the following analysis:

  1. Geovisualization and Analysis

    • Compute the passenger trips
    • Visualize the geographical distribution of the passenger trips
  2. Local Indicators of Spatial Association (LISA) Analysis

    • Compute LISA of the passengers trips
    • Visualize the LISA maps of the passengers trips
  3. Emerging Hot Spot Analysis (EHSA)

    • Perform Mann-Kendall Test by using the spatio-temporal local Gi* values
    • Visualize EHSA maps of the Gi* values of the passenger trips by origin at the hexagon level

1.2 Load packages

The following packages will be useful

  • sf imports and handles geospatial data
  • tidyverse performs aspatial data import, wrangling and visualization
  • dplyr perform relational join
  • spdep compute spatial weights and spatially lagged variables
  • spfep compute spatial autocorrelation
  • ggplot2, mapview and tmap supports data visualisation
pacman::p_load(sf, dplyr, ggplot2, knitr, mapview, spdep, tmap, tidyverse)

1.3 Import data

We will be using the following geospatial (busstops, mpsz) and aspatial (odbus, pop) datasets.

st_geometry() displays basic information of the feature class, such as type of geometry, the geographic extent of the features and the coordinate system of the data. glimpse() transposes the columns in a dataset and makes it possible to see the column name, data type and values in every column in a data frame.

busstops contains the detailed information for all bus stops currently serviced by buses, including bus stop code, road name, description, location coordinates.

Source: LTA DataMall (Postman URL)

Show the code
busstops <- st_read(dsn = "data/BusStopLocation_Jul2023", layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `/Users/chockwankee/Documents/chockwk/ISSS624_Geospatial_Analytics/Take_home_Ex/Take_home_Ex01/data/BusStopLocation_Jul2023' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
Show the code
st_geometry(busstops)
Geometry set for 5161 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21 / Singapore TM
First 5 geometries:

The output indicates that the geospatial objects are point features. There are 5161 features and 3 fields. It is in SVY21 projected coordinates system with XY dimension.

mpsz is the Master Plan 2019, a forward looking guiding plan for Singapore’s development in the medium term over the next 10 to 15 years published in 2019. Note this mpsz differs from that in previous chapter, Data Wrangling.

Source: URA (Download here)

Show the code
mpsz = st_read(dsn="data/MPSZ-2019", layer="MPSZ-2019") %>% 
  st_transform(crs=3414)
Reading layer `MPSZ-2019' from data source 
  `/Users/chockwankee/Documents/chockwk/ISSS624_Geospatial_Analytics/Take_home_Ex/Take_home_Ex01/data/MPSZ-2019' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

The output indicates that the geospatial objects are multipolygon features. There are 332 features and 6 fields. It is in WGS84 projected coordinates system with XY dimension.

odbus contains the number of trips by weekdays and weekends from origin to destination bus stops. It reflects the passenger trip traffic and the most recent dataset from October 2023 will be used.

Source: LTA DataMall (Postman URL)

Show the code
odbus = read_csv("data/origin_destination_bus_202310.csv")
Show the code
odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE)
glimpse(odbus)
Rows: 5,694,297
Columns: 7
$ YEAR_MONTH          <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-…
$ DAY_TYPE            <chr> "WEEKENDS/HOLIDAY", "WEEKDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 16, 16, 14, 14, 17, 17, 17, 7, 14, 14, 10, 20, 20,…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <fct> 04168, 04168, 80119, 80119, 44069, 20281, 20281, 1…
$ DESTINATION_PT_CODE <fct> 10051, 10051, 90079, 90079, 17229, 20141, 20141, 1…
$ TOTAL_TRIPS         <dbl> 3, 5, 3, 5, 4, 1, 24, 2, 1, 7, 3, 2, 5, 1, 1, 1, 1…

The output indicates 5,694,297 records and 7 fields. The bus stop codes are converted into factor for data handling.

pop contains Singapore residents grouped by planning area or subzone, age group, sex and floor area of residence. The data period is from June 2011 onwards. From June 2011 to 2019, the planning areas refer to areas demarcated in the Master Plan 2014, and from June 2020 onwards will be Master Plan 2019.

Source: Department of Statistics (Link)

Show the code
pop <- read_csv("data/respopagesexfa2011to2020/respopagesexfa2011to2020.csv")
Show the code
glimpse(pop)
Rows: 738,492
Columns: 7
$ PA   <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo K…
$ SZ   <chr> "Ang Mo Kio Town Centre", "Ang Mo Kio Town Centre", "Ang Mo Kio T…
$ AG   <chr> "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to…
$ Sex  <chr> "Males", "Males", "Males", "Males", "Males", "Males", "Females", …
$ FA   <chr> "<= 60", ">60 to 80", ">80 to 100", ">100 to 120", ">120", "Not A…
$ Pop  <dbl> 0, 10, 30, 80, 20, 0, 0, 10, 40, 90, 10, 0, 0, 10, 30, 110, 30, 0…
$ Time <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,…

The output indicates 738,492 records and 7 fields to illustrate the population distribution by planning area (PA), subzone (SZ), age group (AG), residence floor area (FA), resident count (Pop).

1.4 Create Spatial Grids

Spatial grids are commonly used in spatial analysis to divide the study area into equal size, regular polygons that tessellate the area of interest. Square grid rarely reflect real world situations, whereas hexagons are compact and can overcome oddly-shaped geographical units.

hexagon is a hexagon layer of 250m where the distance is the perpendicular distance between the centre of the hexagon and its edges. It is a substitute for multipolygon in mpsz which is relatively coarse and irregular.

Step 1

First, we will create hexagonal grids using st_make_grid() to cover the geometry of busstops. We also assign grid id to each hexagon using length() to calculate the length of the geometry and generate a sequence from 1 to the length of vectors in area_hexagon_grid.

Show the code
area_hexagon_grid = st_make_grid(busstops, cellsize = 500, what = "polygons", square = FALSE)

hexagon_grid_sf = st_sf(area_hexagon_grid) %>% 
  mutate(grid_id = 1:length(lengths(area_hexagon_grid)))
hexagon_grid_sf
Simple feature collection with 5580 features and 1 field
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 3470.122 ymin: 26193.43 xmax: 48720.12 ymax: 53184.55
Projected CRS: SVY21 / Singapore TM
First 10 features:
                area_hexagon_grid grid_id
1  POLYGON ((3720.122 26626.44...       1
2  POLYGON ((3720.122 27492.46...       2
3  POLYGON ((3720.122 28358.49...       3
4  POLYGON ((3720.122 29224.51...       4
5  POLYGON ((3720.122 30090.54...       5
6  POLYGON ((3720.122 30956.57...       6
7  POLYGON ((3720.122 31822.59...       7
8  POLYGON ((3720.122 32688.62...       8
9  POLYGON ((3720.122 33554.64...       9
10 POLYGON ((3720.122 34420.67...      10

The output indicates that the geospatial objects are polygon features. There are 5580 features and 1 fields. It is in SVY21 projected coordinates system with XY dimension, same as that in busstops dataset.

Step 2

Second, st_intersects() combines the hexagon object hexagon_grid_sf and busstops and length() counts the number of bus stops in each grid. The grids without any bus stops will be removed by filter() to reflect the distribution in reality.

Show the code
hexagon_grid_sf$grid_id = lengths(st_intersects(hexagon_grid_sf, busstops))
hexagon_count = filter(hexagon_grid_sf, grid_id > 0)
hexagon_count
Simple feature collection with 1524 features and 1 field
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 3720.122 ymin: 26193.43 xmax: 48720.12 ymax: 53184.55
Projected CRS: SVY21 / Singapore TM
First 10 features:
                area_hexagon_grid grid_id
1  POLYGON ((3970.122 27925.48...       1
2  POLYGON ((4220.122 28358.49...       1
3  POLYGON ((4470.122 30523.55...       1
4  POLYGON ((4720.122 28358.49...       1
5  POLYGON ((4720.122 30090.54...       2
6  POLYGON ((4720.122 30956.57...       1
7  POLYGON ((4720.122 31822.59...       1
8  POLYGON ((4970.122 28791.5,...       1
9  POLYGON ((4970.122 29657.53...       1
10 POLYGON ((4970.122 30523.55...       2

The output indicates that the geospatial objects retained polygon features and there are more than one polygon feature in a grid_id. The intersection of busstops and hexagon_grid_sf yields 1524 features and 1 fields, indicating only 1524 out of 5580 features contains bus stops. It is in SVY21 projected coordinates system with XY dimension.

Step 3

Using tmap, the distribution of bus stops across Singapore can be visualised in the interactive map below.

Show the code
tmap_mode("view")
hexagon = tm_shape(hexagon_count)+
  tm_fill(
    col = "grid_id",
    palette = "Reds",
    style = "cont",
    title = "Number of Bus Stops in Singapore",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
    popup.format = list(
      grid_id = list(format = "f", digits = 0)
    )
  )+
  tm_borders(col = "grey40", lwd = 0.7)
hexagon

The geospatial distribution of bus stops in Singapore, as visualized in the above map, is extensive and dispersed throughout the central region, with a notable concentration of stops with darker shades of red signifying higher concentrations. Outlying regions, along the coastal areas, show fewer bus stops as indicated by the presence of lighter-colored hexagons or even white spaces where no stops are present. This distribution pattern suggests a robust public transportation infrastructure in urban and densely populated areas, tapering off in less populated or industrial regions.

1.5 Explore data

mapview creates interactive visualisations of spatial data to examine and visually investigate both aspects of spatial data, the geometries and their attributes.

plot() takes parameters for specifying points in the diagram. At its simplest, it can plot two numbers against each other. With datasets, it can generate maps and plot the specified columns/attributes, with default up to nine plots or maximum all plots.

mapview(busstops, cex = 3, alpha = 0.5, popup = NULL)
# Filter hexagons that contain more than 8 bus stops
hexagon_red = filter(hexagon_grid_sf, grid_id>8)

tmap_mode("view")
redhex = tm_shape(hexagon_red)+
  tm_fill(
    col = "grid_id",
    palette = "Reds",
    style = "cont",
    title = "Number of Bus Stops",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
    popup.format = list(
      grid_id = list(format = "f", digits = 0)
    )
  )+
  tm_borders(col = "grey40", lwd = 0.7)
redhex

Sembawang MRT (9 bus stops)

Pasir Ris (11 bus stops)

# Summarize the total trips by day type
total_trips <- odbus %>%
  group_by(DAY_TYPE) %>%
  summarise(total = sum(TOTAL_TRIPS))

# Plot using ggplot2
ggplot(total_trips, aes(x = DAY_TYPE, y = total, fill = DAY_TYPE)) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  labs(title = "Total Trips by Day Type", x = "Day Type", y = "Total Trips")

plot(mpsz["PLN_AREA_N"])

1.6 Geovisualisation and Analysis

1.6.2 Compute the passenger trips generated by origin

Step 1: Classify by time interval

# Function to assign peak and non-peak times
time_interval <- function(day_type, time_per_hour) {
  if (day_type == "WEEKDAY") {
    if (time_per_hour >= 6 & time_per_hour <= 9) {
      "morning peak"
    } else if (time_per_hour >= 17 & time_per_hour <= 20) {
      "afternoon peak"
    } else {
      "non peak"
    }
  } else if (day_type == "WEEKENDS/HOLIDAY") {
    if (time_per_hour >= 11 & time_per_hour <= 14) {
      "morning peak"
    } else if (time_per_hour >= 16 & time_per_hour <= 19) {
      "evening peak"
    } else {
      "non peak"
    }
  } else {
    "non peak"
  }
}

# Assuming 'odbus' is your data frame
odbus$TIME <- mapply(time_interval, odbus$DAY_TYPE, odbus$TIME_PER_HOUR)

# Checking the first few rows of the data frame to verify the new column
head(odbus)
# A tibble: 6 × 8
  YEAR_MONTH DAY_TYPE         TIME_PER_H…¹ PT_TYPE ORIGI…² DESTI…³ TOTAL…⁴ TIME 
  <chr>      <chr>                   <dbl> <chr>   <fct>   <fct>     <dbl> <chr>
1 2023-10    WEEKENDS/HOLIDAY           16 BUS     04168   10051         3 even…
2 2023-10    WEEKDAY                    16 BUS     04168   10051         5 non …
3 2023-10    WEEKENDS/HOLIDAY           14 BUS     80119   90079         3 morn…
4 2023-10    WEEKDAY                    14 BUS     80119   90079         5 non …
5 2023-10    WEEKDAY                    17 BUS     44069   17229         4 afte…
6 2023-10    WEEKENDS/HOLIDAY           17 BUS     20281   20141         1 even…
# … with abbreviated variable names ¹​TIME_PER_HOUR, ²​ORIGIN_PT_CODE,
#   ³​DESTINATION_PT_CODE, ⁴​TOTAL_TRIPS

Step 2: Join datasets

passengertrips <- left_join(busstops, odbus, 
                            by = c("BUS_STOP_N" = "ORIGIN_PT_CODE"))
passengertrips_day <- passengertrips %>%
  group_by(BUS_STOP_N, BUS_ROOF_N, LOC_DESC, YEAR_MONTH, geometry) %>%
  summarise(
    WEEKDAY_TRIPS = sum(TOTAL_TRIPS[DAY_TYPE == "WEEKDAY"], na.rm = TRUE),
    WEEKENDS_HOLIDAYS_TRIPS = sum(TOTAL_TRIPS[DAY_TYPE != "WEEKDAY"], na.rm = TRUE),
    .groups = "drop"
  )

Determine peak or non peak

passengertrips_time <- passengertrips %>%
  group_by(BUS_STOP_N, BUS_ROOF_N, LOC_DESC, YEAR_MONTH, geometry) %>%
  summarise(
    WEEKDAY_MORNING_PEAK = sum(TOTAL_TRIPS[DAY_TYPE == "WEEKDAY" & TIME == "morning peak"], na.rm = TRUE),
    WEEKDAY_AFTERNOON_PEAK = sum(TOTAL_TRIPS[DAY_TYPE == "WEEKDAY" & TIME == "afternoon peak"], na.rm = TRUE),
    WEEKDAY_NON_PEAK = sum(TOTAL_TRIPS[DAY_TYPE == "WEEKDAY" & TIME == "non peak"], na.rm = TRUE),
    WEEKENDS_HOLIDAYS_MORNING_PEAK = sum(TOTAL_TRIPS[DAY_TYPE != "WEEKDAY" & TIME == "morning peak"], na.rm = TRUE),
    WEEKENDS_HOLIDAYS_EVENING_PEAK = sum(TOTAL_TRIPS[DAY_TYPE != "WEEKDAY" & TIME == "evening peak"], na.rm = TRUE),
    WEEKENDS_HOLIDAYS_NON_PEAK = sum(TOTAL_TRIPS[DAY_TYPE != "WEEKDAY" & TIME == "non peak"], na.rm = TRUE),
    .groups = "drop"
  )

passengertrips_time <- passengertrips_time %>% 
  filter(!(WEEKDAY_MORNING_PEAK == 0 
           & WEEKDAY_AFTERNOON_PEAK == 0
           & WEEKDAY_NON_PEAK == 0
           & WEEKENDS_HOLIDAYS_MORNING_PEAK == 0
           & WEEKENDS_HOLIDAYS_EVENING_PEAK == 0
           & WEEKENDS_HOLIDAYS_NON_PEAK == 0))

Step 3: Ensure both datasets are in the same coordinate reference system (CRS).

st_crs(passengertrips_day)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(passengertrips_time)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(hexagon_grid_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Step 4: Perform a spatial join to match trips to hexagons

passengergrid_day <- st_join(hexagon_grid_sf, passengertrips_day, join = st_intersects)
# Remove rows with no trips (NA values)
passengergrid_day <- passengergrid_day %>% 
  filter(!is.na(BUS_STOP_N))
glimpse(passengergrid_day)
Rows: 5,161
Columns: 8
$ grid_id                 <int> 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1…
$ BUS_STOP_N              <chr> "25059", "25751", "26379", "25761", "25719", "…
$ BUS_ROOF_N              <chr> "UNK", "B02D", "NIL", "B03", "B01C", "NIL", "N…
$ LOC_DESC                <chr> "AFT TUAS STH BLVD", "BEF TUAS STH AVE 14", "Y…
$ YEAR_MONTH              <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2…
$ WEEKDAY_TRIPS           <dbl> 526, 468, 1055, 3420, 4584, 944, 689, 768, 644…
$ WEEKENDS_HOLIDAYS_TRIPS <dbl> 105, 96, 247, 816, 1688, 379, 248, 147, 108, 3…
$ area_hexagon_grid       <POLYGON [m]> POLYGON ((3970.122 27925.48..., POLYGO…
passengergrid_time <- st_join(hexagon_grid_sf, passengertrips_time, join = st_intersects)
# Remove rows with no trips (NA values)
passengergrid_time <- passengergrid_time %>% 
  filter(!is.na(BUS_STOP_N))
glimpse(passengergrid_time)
Rows: 5,029
Columns: 12
$ grid_id                        <int> 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 2, …
$ BUS_STOP_N                     <chr> "25059", "25751", "26379", "25761", "25…
$ BUS_ROOF_N                     <chr> "UNK", "B02D", "NIL", "B03", "B01C", "N…
$ LOC_DESC                       <chr> "AFT TUAS STH BLVD", "BEF TUAS STH AVE …
$ YEAR_MONTH                     <chr> "2023-10", "2023-10", "2023-10", "2023-…
$ WEEKDAY_MORNING_PEAK           <dbl> 103, 52, 78, 185, 815, 301, 53, 60, 64,…
$ WEEKDAY_AFTERNOON_PEAK         <dbl> 390, 114, 291, 1905, 2600, 299, 241, 36…
$ WEEKDAY_NON_PEAK               <dbl> 33, 302, 686, 1330, 1169, 344, 395, 340…
$ WEEKENDS_HOLIDAYS_MORNING_PEAK <dbl> 0, 26, 52, 187, 367, 88, 76, 45, 21, 39…
$ WEEKENDS_HOLIDAYS_EVENING_PEAK <dbl> 56, 14, 100, 346, 533, 101, 55, 49, 53,…
$ WEEKENDS_HOLIDAYS_NON_PEAK     <dbl> 49, 56, 95, 283, 788, 190, 117, 53, 34,…
$ area_hexagon_grid              <POLYGON [m]> POLYGON ((3970.122 27925.48...,…

Step 5: Split passengers trip into weekday and weekend

::: panel-tabset #### Day

# Subset for Weekday
weekday_trips <- passengergrid_day %>%
  group_by(BUS_STOP_N) %>%
  summarise(
    weekday_trips = sum(WEEKDAY_TRIPS, na.rm = TRUE),
  )

# Subset for Weekend
weekend_trips <- passengergrid_day %>%
  group_by(BUS_STOP_N) %>%
  summarise(
    weekend_trips = sum(WEEKENDS_HOLIDAYS_TRIPS, na.rm = TRUE),
  )

Time

# First, ensure all necessary columns are present in the dataframe
passengergrid_clean <- passengergrid_time %>%
  mutate(
    WEEKDAY_MORNING_PEAK = ifelse(is.na(WEEKDAY_MORNING_PEAK), 0, WEEKDAY_MORNING_PEAK),
    WEEKDAY_AFTERNOON_PEAK = ifelse(is.na(WEEKDAY_AFTERNOON_PEAK), 0, WEEKDAY_AFTERNOON_PEAK),
    WEEKENDS_HOLIDAYS_MORNING_PEAK = ifelse(is.na(WEEKENDS_HOLIDAYS_MORNING_PEAK), 0, WEEKENDS_HOLIDAYS_MORNING_PEAK),
    WEEKENDS_HOLIDAYS_EVENING_PEAK = ifelse(is.na(WEEKENDS_HOLIDAYS_EVENING_PEAK), 0, WEEKENDS_HOLIDAYS_EVENING_PEAK)
  )

# Function to summarise and filter bus stops with no trips
summarise_and_filter <- function(data, column) {
  data %>%
    group_by(BUS_STOP_N) %>%
    summarise(Total_Trips = sum({{ column }}, na.rm = TRUE)) %>%
    filter(Total_Trips > 0) %>%
    ungroup()
}

# Create subsets using the function
weekday_morning_peak <- summarise_and_filter(passengergrid_clean, 
                                             WEEKDAY_MORNING_PEAK)

weekday_afternoon_peak <- summarise_and_filter(passengergrid_clean, 
                                               WEEKDAY_AFTERNOON_PEAK)

weekend_morning_peak <- summarise_and_filter(passengergrid_clean, 
                                             WEEKENDS_HOLIDAYS_MORNING_PEAK)

weekend_evening_peak <- summarise_and_filter(passengergrid_clean, 
                                             WEEKENDS_HOLIDAYS_EVENING_PEAK)

# Check for any BUS_STOP_N that might still have issues
problematic_stops <- passengergrid_clean %>%
  filter(is.na(BUS_STOP_N)) %>%
  pull(BUS_STOP_N) %>%
  unique()

# If problematic_stops has any values, you may need to address these specifically,
# for example by removing them from passengergrid_clean before creating the subsets
if (length(problematic_stops) > 0) {
  passengergrid_clean <- passengergrid_clean %>%
    filter(!(BUS_STOP_N %in% problematic_stops))
}

Step 6: Plot passenger trips traffic

# Convert your data to an sf object if it's not one already
weekday_sf <- st_as_sf(weekday_trips, coords = c("longitude", "latitude"), crs = 4326) 

# Calculate breaks at intervals of 25,000
max_trips <- max(weekday_sf$weekday_trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)

# Print the map
tmap_mode("view")
tm_shape(weekday_sf) +
  tm_polygons("weekday_trips", 
              palette = "Blues", 
              border.col = "grey40",
              breaks = breaks,
              title = "Weekday Trips") +
  tm_scale_bar()+
  tm_layout(main.title = "Weekday Trips by Bus Stop", 
            main.title.position = "center", 
            frame = FALSE)
# Convert your data to an sf object if it's not one already
weekday_morning_peak_sf <- st_as_sf(weekday_morning_peak, coords = c("longitude", "latitude"), crs = 4326) 

# Calculate breaks at intervals of 25,000
max_trips <- max(weekday_morning_peak_sf$Total_Trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)

# Print the map
tmap_mode("view")
tm_shape(weekday_morning_peak_sf) +
  tm_polygons("Total_Trips", 
              palette = "Blues", 
              border.col = "grey40",
              breaks = breaks,
              title = "weekday_morning_peak Trips") +
  tm_scale_bar()+
  tm_layout(main.title = "weekday_morning_peak Trips by Bus Stop", 
            main.title.position = "center", 
            frame = FALSE)
# Convert your data to an sf object if it's not one already
weekday_afternoon_peak_sf <- st_as_sf(weekday_afternoon_peak, coords = c("longitude", "latitude"), crs = 4326) 

# Calculate breaks at intervals of 25,000
max_trips <- max(weekday_afternoon_peak_sf$Total_Trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)

# Print the map
tmap_mode("view")
tm_shape(weekday_afternoon_peak_sf) +
  tm_polygons("Total_Trips", 
              palette = "Blues", 
              border.col = "grey40",
              breaks = breaks,
              title = "Weekday Afternoon Peak Trips") +
  tm_scale_bar()+
  tm_layout(main.title = "Weekday Afternoon Peak Trips by Bus Stop", 
            main.title.position = "center", 
            frame = FALSE)
# Convert your data to an sf object if it's not one already
weekend_sf <- st_as_sf(weekend_trips, coords = c("longitude", "latitude"), crs = 4326) 

# Calculate breaks at intervals of 25,000
max_trips <- max(weekend_sf$weekend_trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)

# Print the map
tmap_mode("view")
tm_shape(weekend_sf) +
  tm_polygons("weekend_trips", 
              palette = "Purples", 
              border.col = "grey40",
              breaks = breaks,
              title = "Weekend Trips") +
  tm_scale_bar()+
  tm_layout(main.title = "Weekend Trips by Bus Stop", 
            main.title.position = "center", 
            frame = FALSE)
# Convert your data to an sf object if it's not one already
weekend_morning_peak_sf <- st_as_sf(weekend_morning_peak, coords = c("longitude", "latitude"), crs = 4326) 

# Calculate breaks at intervals of 25,000
max_trips <- max(weekend_morning_peak_sf$Total_Trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)

# Print the map
tmap_mode("view")
tm_shape(weekend_morning_peak_sf) +
  tm_polygons("Total_Trips", 
              palette = "Purples", 
              border.col = "grey40",
              breaks = breaks,
              title = "Weekend Morning Peak Trips") +
  tm_scale_bar()+
  tm_layout(main.title = "Weekend Morning Peak Trips by Bus Stop", 
            main.title.position = "center", 
            frame = FALSE)
# Convert your data to an sf object if it's not one already
weekend_evening_peak_sf <- st_as_sf(weekend_evening_peak, coords = c("longitude", "latitude"), crs = 4326) 

# Calculate breaks at intervals of 25,000
max_trips <- max(weekend_evening_peak_sf$Total_Trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)

# Print the map
tmap_mode("view")
tm_shape(weekend_evening_peak_sf) +
  tm_polygons("Total_Trips", 
              palette = "Purples", 
              border.col = "grey40",
              breaks = breaks,
              title = "Weekend Evening Peak Trips") +
  tm_scale_bar()+
  tm_layout(main.title = "Weekend Evening Peak Trips by Bus Stop", 
            main.title.position = "center", 
            frame = FALSE)

1.6 Local Indicators of Spatial Association (LISA) Analysis

1.6.1 Compute LISA of the passengers trips generate by origin at hexagon level.

Method 1: Computing Contiguity Spatial Weights

poly2nb defines the spatial relationship between different regions by computing contiguity weight matrices and build a neighbours list based on regions with contiguous boundaries.

passengergrid_day_total <- passengergrid_day %>%
  mutate(TOTAL_TRIPS = WEEKDAY_TRIPS + WEEKENDS_HOLIDAYS_TRIPS)
wm_q_day <- poly2nb(passengergrid_day_total, queen=TRUE)
summary(wm_q_day)
Neighbour list object:
Number of regions: 5161 
Number of nonzero links: 112788 
Percentage nonzero weights: 0.4234432 
Average number of links: 21.8539 
7 regions with no links:
1767 2407 2884 3350 4627 5154 5222
Link number distribution:

  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19 
  7  12  17  27  36  89  67 103  85  71 125 135 105 140 150 139 189 191 222 153 
 20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39 
186 161 244 261 220 212 200 205 196 173 117 128 135 117 114  69  62  49  34  50 
 40  41  42  43  44  45  46  49 
 46  52  11  25  13   6   4   8 
12 least connected regions:
34 251 313 1543 3507 3507.1 5129 5159 5278 5278.1 5558 5558.1 with 1 link
8 most connected regions:
3292 3292.1 3292.2 3292.3 3292.4 3292.5 3292.6 3292.7 with 49 links

The output shown is a summary of a neighbors list object using Queen contiguity weight matrix. There are 5,029 regions and approximately 42.69% of all possible neighbor pairs are neighbors with nonzero links. Each region has an average of 21.47 neighboring regions. For poorly connected regions, there are 7 regions without neighbors and 14 regions with 1 neighbour. There are 8 well-connected regions with 48 neighbours.

passengergrid_time_total <- passengergrid_time %>% 
  mutate(TOTAL_TRIPS = WEEKDAY_MORNING_PEAK + WEEKDAY_AFTERNOON_PEAK + WEEKDAY_NON_PEAK +
           WEEKENDS_HOLIDAYS_MORNING_PEAK + WEEKENDS_HOLIDAYS_EVENING_PEAK + WEEKENDS_HOLIDAYS_NON_PEAK)
wm_q_time <- poly2nb(passengergrid_time_total, queen=TRUE)
summary(wm_q_time)
Neighbour list object:
Number of regions: 5029 
Number of nonzero links: 107980 
Percentage nonzero weights: 0.426953 
Average number of links: 21.47147 
7 regions with no links:
1767 2407 2884 3350 4627 5154 5222
Link number distribution:

  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19 
  7  14  17  30  28  94  67 103  82  81 116 137 122 137 130 159 166 230 197 163 
 20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39 
157 177 257 237 209 197 216 217 181 185 127 128  93 121  97  47  62  48   9  27 
 40  41  42  43  44  48 
 56  47  17  19  10   8 
14 least connected regions:
34 251 313 1543 2135 2135.1 3507 3507.1 5129 5159 5278 5278.1 5558 5558.1 with 1 link
8 most connected regions:
3292 3292.1 3292.2 3292.3 3292.4 3292.5 3292.6 3292.7 with 48 links

The output shown is a summary of a neighbors list object. There are 5,029 regions and approximately 42.69% of all possible neighbor pairs are neighbors with nonzero links. Each region has an average of 21.47 neighboring regions. For poorly connected regions, there are 7 regions without neighbors and 14 regions with 1 neighbour. There are 8 well-connected regions with 48 neighbours.

Method 2: Computing distance based neighbours

passengergrid is made up of polygons features, so we will need to get points in order to make our connectivity graphs. The most typically method for this will be polygon centroids.

longitude <- map_dbl(passengergrid_time_total$area_hexagon_grid, ~st_centroid(.x)[[1]])
latitude <- map_dbl(passengergrid_time_total$area_hexagon_grid, ~st_centroid(.x)[[2]])
coords <- cbind(longitude, latitude)
head(coords)
     longitude latitude
[1,]  3970.122 28214.15
[2,]  4220.122 28647.16
[3,]  4470.122 30812.23
[4,]  4720.122 28647.16
[5,]  4720.122 30379.22
[6,]  4720.122 30379.22
k1 <- knn2nb(knearneigh(coords))
k1dists <- unlist(nbdists(k1, coords, longlat = TRUE))
summary(k1dists)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0     0.0     0.0   402.8     0.0 16228.8 

The summary report shows that the largest first nearest neighbour distance is 16228.8km, so using this as the upper threshold gives certainty that all units will have at least one neighbour.

wm_d16229 <- dnearneigh(coords, 0, 16229, longlat = TRUE)
wm_d16229
Neighbour list object:
Number of regions: 5029 
Number of nonzero links: 22604344 
Percentage nonzero weights: 89.37759 
Average number of links: 4494.799 
# Make a new list with only the first 5 elements
wm_d16229_subset <- wm_d16229[1:5]

# Now use str() on this subset
str(wm_d16229_subset)
List of 5
 $ : int [1:4468] 2 4 5 6 8 9 10 11 12 13 ...
 $ : int [1:4679] 1 3 4 5 6 7 8 9 10 11 ...
 $ : int [1:4681] 2 4 5 6 7 8 9 10 11 12 ...
 $ : int [1:4703] 1 2 3 5 6 7 8 9 10 11 ...
 $ : int [1:4562] 1 2 3 4 6 7 8 9 10 11 ...

derive a spatial weight matrix based on Inversed Distance method.

#dist_day <- nbdists(wm_q_day, coords, longlat = TRUE)
#ids_day <- lapply(dist, function(x) 1/(x))
#ids_day

1.6.2 Display the LISA maps of the passengers trips generate by origin at hexagon level.

1.7 Emerging Hot Spot Analysis (EHSA)

1.7.1 Perform Mann-Kendall Test by using the spatio-temporal local Gi* values.

1.7.2 Prepared EHSA maps of the Gi* values of the passenger trips by origin at the hexagon level.

weekday evening peak passenger trips by origin; hexagons dont not have shared boundary; isolated with no neighbours but a hotspot.